Observational Studies of Software Engineering Using Data
نویسندگان
چکیده
OBSERVATIONAL STUDIES OF SOFTWARE ENGINEERING USING DATA FROM SOFTWARE REPOSITORIES Daniel Pierce Delorey Department of Computer Science Master of Science Data for empirical studies of software engineering can be difficult to obtain. Extrapolations from small controlled experiments to large development environments are tenuous and observation tends to change the behavior of the subjects. In this thesis we propose the use of data gathered from software repositories in observational studies of software engineering. We present tools we have developed to extract data from CVS repositories and the SourceForge Research Archive. We use these tools to gather data from 9,999 Open Source projects. By analyzing these data we are able to provide insights into the structure of Open Source projects. For example, we find that the vast majority of the projects studied have never had more than three contributors and that the vast majority of authors studied have never contributed to more than one project. However, there are projects that have had up to 120 contributors in a single year and authors who have contributed to more than 20 projects which raises interesting questions about team dynamics in the Open Source community. We also use these data to empirically test the belief that productivity is constant in terms of lines of code per programmer per year regardless of the programming language used. We find that yearly programmer productivity is not constant across programming languages, but rather that developers using higher level languages tend to write fewer lines of code per year than those using lower level languages.
منابع مشابه
Zoning hydraulic conductivity using different geostatistical methods (Case study Shavoor)
In studies of irrigation and drainage projects for drainage, it is necessary to extend the data from the sampling point to the network. Therefore, based on available data from observational wells, estimating the state of hydraulic conductivity (K) in the surrounding area. The estimation process values for locations where there is no information for them based on viewing areas called wells spati...
متن کاملEffect of Classical Music on Physiological Characteristics and Observational and Behavioral Measures of Pain in Unconscious Patients Admitted to Intensive Care Units
Objective: Assessment and management of pain in patients under artificial respiration and hospitalized in Intensive Care Units (ICUs) are difficult, and is less considered by physicians and nurses. This study aims to determine the effect of classical music on physiological characteristics, and observational and behavioral measures of pain in unconscious patients admitted to ICUs. Methods: This...
متن کاملOn-Line Nonlinear Dynamic Data Reconciliation Using Extended Kalman Filtering: Application to a Distillation Column and a CSTR
Extended Kalman Filtering (EKF) is a nonlinear dynamic data reconciliation (NDDR) method. One of its main advantages is its suitability for on-line applications. This paper presents an on-line NDDR method using EKF. It is implemented for two case studies, temperature measurements of a distillation column and concentration measurements of a CSTR. In each time step, random numbers with zero m...
متن کاملConducting Empirical Studies to Evaluate a Technique to Inspect Software Testing Artifacts
Experimentation is becoming increasingly used in the Software Engineering field. Several methodologies adopt empirical studies as an instrument to support the software technologies’ conception and maturation. This paper presents the application of a methodology based on the conduction of primary studies to develop a new checklist-based technique, named TestCheck, for inspection of software test...
متن کاملSoftware for Reliability Data Analysis and Test Planning
Increasingly, statisticians and reliability engineers in industry are being asked to analyze reliability data. Because of the complicated nature of the data and models that are often encountered in reliability studies, statistical methods and corresponding software needed for appropriate analyses are not developed as well as methods and software needed for the analysis of standard experimental ...
متن کامل